Back

Experimental Neurology

Elsevier BV

Preprints posted in the last 7 days, ranked by how well they match Experimental Neurology's content profile, based on 57 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Spatial Decomposition of Longitudinal RNFL Maps Reveals Distinct Modes of Glaucomatous Progression with Structure Function and Genetic Signatures

Chen, L.; Zhao, Y.; Moradi, M.; Eslami, M.; Wang, M.; Elze, T.; Zebardast, N.

2026-04-11 health informatics 10.64898/2026.04.09.26350387 medRxiv
Top 0.1%
10.2%
Show abstract

Purpose: To determine whether spatial decomposition of longitudinal retinal nerve fiber layer (RNFL) change maps reveals distinct modes of glaucomatous progression masked by conventional averaging, and to validate these modes through structure function mapping and genetic association analysis. Methods: Pixel wise RNFL rates of change were computed from longitudinal optic disc OCT scans of 15,242 eyes (8,419 adults with primary open angle glaucoma [POAG]; Massachusetts Eye and Ear, 1998 to 2023). A loss only constraint zeroed all thickening values, reflecting the biological prior that adult RNFL does not regenerate. Nonnegative matrix factorization decomposed these maps into spatial progression components (80% training set). Components were evaluated in a heldout set (20%) for retinotopic structure function concordance, visual field (VF) progressor classification against global and quadrant RNFL rates, and enrichment of genetic association signals at established POAG loci. Results: Six anatomically distinct progression patterns emerged, including diffuse circumferential loss, focal peripapillary defects, and arcuate bundle degeneration. Pattern based models significantly outperformed global RNFL rate for classifying VF progressors (area under the curve, 0.750 [95% CI, 0.709 to 0.790] vs. 0.702; P = .0096) and explained additional variance in functional decline (Nagelkerke pseudoR2, 0.301 vs. 0.198; P = .0011). Structure function mapping confirmed retinotopic coherence. Spatial phenotypes recovered stronger genetic signals than global rates at 85.3% of established POAG loci, suggesting they capture more biologically homogeneous endophenotypes of progression. Conclusions: Glaucomatous structural progression occurs through spatially distinct modes with independent structure function and genetic signatures that conventional RNFL averaging obscures.

2
Validated Synthetic Data Generation from a Multicenter Spine Surgery Registry: Methodology and Benchmark

Challier, V.; Jacquemin, C.; Diebo, B.; Dehouche, N.; Denisov, A.; Cristini, J.; Campana, M.; Castelain, J.-E.; Lonjon, G.; Lafage, V.; Ghailane, S.; SpineDAO Collaborative Group,

2026-04-11 health informatics 10.64898/2026.04.07.26350316 medRxiv
Top 1%
0.8%
Show abstract

BackgroundSynthetic data have emerged as a complementary strategy for secondary use of clinical registries, enabling data sharing without patient-level exposure. In spine surgery, multicenter data sharing is constrained by institutional governance and patient privacy regulations. Validated synthetic data generation may enable broader access to surgical outcomes data for artificial intelligence development without compromising patient confidentiality. ObjectiveTo describe and benchmark a three-domain validated synthetic data pipeline applied to a multicenter, tokenized spine surgery registry (SpineBase), and to establish a reproducible certification framework for synthetic spine surgery datasets. MethodsWe extracted 125 sacroiliac joint fusion cases from the SpineBase registry (SIBONE study, IRB-SOFCOT approval Ref. 14-2025; CNIL MR-004 Ref. 2234503 v 0). A GaussianCopula generative model was trained on 52 structured variables spanning demographics, preoperative assessments, operative details, and longitudinal outcomes at 3, 6, 12, and 24 months. Synthetic datasets of 100, 1,000, and 10,000 patients were generated. Validation followed a three-domain framework: (1) fidelity, assessed by Kolmogorov-Smirnov tests and Jensen-Shannon divergence; (2) utility, assessed by train-on-synthetic, test-on-real (TSTR) methodology; and (3) privacy, assessed by nearest-neighbor distance ratio (NNDR), membership inference attack, and k-anonymity proxy. ResultsAll three validation gates passed. Fidelity: mean KS p-value 0.52 (threshold >0.05). Privacy: NNDR >1.0 in 98.9% of synthetic records; membership inference AUROC 0.57. Utility: 12-month Oswestry Disability Index prediction yielded Pearson r = 0.29, consistent with expected attenuation at N = 125. A SHA-256 cryptographic hash of each certified dataset was anchored on the Solana blockchain for immutable provenance. ConclusionsA validated, blockchain-anchored synthetic data pipeline for spine surgery registries is technically feasible and meets current publication-standard criteria for fidelity and privacy. Utility metrics scale with registry size, creating a direct incentive for multicenter data contribution. This framework provides a reproducible methodology for synthetic data certification in spine surgery research, and establishes certified synthetic datasets as a privacy-native substrate for expert-annotation pipelines -- as demonstrated in the companion Spine Reviews study.

3
Longitudinal MAP-MRI-based Assessment of Tissue Microstructural Alterations in Acute mTBI

Gangolli, M.; Perkins, N. J.; Marinelli, L.; Basser, P. J.; Avram, A. V.

2026-04-13 radiology and imaging 10.64898/2026.04.06.26350074 medRxiv
Top 1%
0.7%
Show abstract

BACKGROUNDMild traumatic brain injury (mTBI) is a signature injury in civilian and military populations that remains invisible to detection by conventional radiological methods. Diffusion MRI has been identified as a potential clinical tool for revealing subtle microstructural alterations associated with mTBI. OBJECTIVEThis study evaluates whether a comprehensive and powerful diffusion MRI (dMRI) technique called mean apparent propagator (MAP) MRI can detect sequelae of mTBI. METHODSWe analyzed data from 417 participants of the GE/NFL prospective mTBI study which included 143 matched controls (mean age, 21.9 {+/-} 8.3 years; 76 women) and 274 patients with acute mTBI and GCS [≥]13 (mean age, 21.9 {+/-} 8.5 years; 131 women). All participants underwent MRI exams at up to four visits including structural high-resolution T1W, T2W, FLAIR-T2W, and dMRI, in addition to clinical assessments of post-concussive physical symptoms (RPQ-3), psychosocial functioning and lifestyle symptoms (RPQ-13), and postural stability (BESS). The dMRI data for each subject were co-registered across all visits and analyzed using the MAP-MRI framework to measure and map the distribution of net microscopic displacements of diffusing water molecules in tissue and ultimately compute the microstructural MAP-MRI tissue parameters including propagator anisotropy (PA), Non-Gaussianity (NG), return-to-origin probability (RTOP), return-to-axis probability (RTAP), and return-to-plane probability (RTPP). We quantified voxel-wise and region-of-interest (ROI)-based changes in these parameters across all four visits. RESULTSMAP-MRI parameter values were within the expected ranges and showed relatively little variation across visits. We found no significant differences in the longitudinal trajectories of these parameters between mTBI patients and controls. At acute post-injury timepoints, RPQ-3 and RPQ-13 scores were increased in mTBI patients relative to controls, while BESS scores were not significantly different between groups. Analysis of dMRI metrics and clinical mTBI markers showed significant correspondence between MAP-MRI metrics in cortical gray matter, caudate and pallidum and BESS scores. CONCLUSIONWe developed and tested a state-of-the-art quantitative image processing pipeline for sensitive analysis and detection of subtle tissue changes in longitudinal clinical diffusion MRI data. The absence of a significant statistical difference between populations in the dMRI parameters in this study suggests that the mTBI corresponded to acute post-injury clinical symptoms but that the injury was not severe enough to cause detectable microstructural damage/alterations, and that increased diffusion sensitization combined with improved analysis techniques may be needed. CLINICAL IMPACTThese findings suggest that acute mTBI (GCS[≥]13) may not be detectable with diffusion MRI. TRIAL REGISTRATIONClinicalTrials.gov NCT02556177

4
Wearable sleep staging using photoplethysmography and accelerometry across sleep apnea severity: a focus on very severe sleep apnea

Ogaki, S.; Kaneda, M.; Nohara, T.; Fujita, S.; Osako, N.; Yagi, T.; Tomita, Y.; Ogata, T.

2026-04-13 health informatics 10.64898/2026.04.09.26350266 medRxiv
Top 2%
0.4%
Show abstract

Study ObjectivesTo evaluate wearable sleep staging across sleep apnea severity, including very severe sleep apnea defined as an apnea-hypopnea index (AHI)[≥] 50 events/h, and to assess how training-set composition affects performance in this subgroup. MethodsWe analyzed 552 overnight recordings, 318 from the Sleep Lab Dataset and 234 from the Hospital Dataset. In the Hospital Dataset, 26.5% had very severe sleep apnea. We developed a deep learning model for sleep staging using RR intervals from wrist-worn photoplethysmography and three-axis accelerometry. Baseline performance was assessed by cross-validation under 5-stage and 4-stage staging. We examined night-level associations with AHI severity. We also compared the baseline model with an ablation model trained on the same number of recordings but with more Sleep Lab Dataset and lower-AHI Hospital Dataset recordings, evaluating both models in the very severe subgroup. ResultsIn 5-stage classification, Cohens kappa was 0.586 in the Sleep Lab Dataset and 0.446 in the Hospital Dataset. Under 4-stage staging, the gap narrowed, with kappa values of 0.632 and 0.525, respectively. In the Hospital Dataset, performance declined with increasing AHI severity. Among 62 recordings with very severe sleep apnea, reducing high-AHI representation in training lowered kappa from 0.365 to 0.303. ConclusionsWearable sleep staging performance declined across greater sleep apnea severity in this clinical cohort. Clinical utility may benefit from training data that better represent the target severity spectrum and from selecting staging granularity to match the intended use case. Statement of SignificanceRepeated laboratory polysomnography is impractical for long-term sleep apnea management. Wearable sleep staging could support scalable monitoring, yet its reliability in clinically severe sleep apnea has remained unclear. This study developed and evaluated a wearable sleep staging approach in both sleep-laboratory and hospital cohorts. The hospital cohort included many severe and very severe cases. Performance was lower in the hospital cohort and declined with greater sleep apnea severity. A coarser staging scheme reduced the gap between cohorts, and models trained without representative very severe cases performed worse in this target population. These findings highlight the value of severity-aware model development and motivate future multi-night home validation with reliability cues.

5
Democratizing Scientific Publishing: A Local, Multi-Agent LLM Framework for Objective Manuscript Editing

Bhansali, R.; Gorenshtein, A.; Westover, B.; Goldenholz, D. M.

2026-04-17 health informatics 10.64898/2026.04.13.26350761 medRxiv
Top 2%
0.4%
Show abstract

Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 agent-suggested rewrite pairs using Phase 0 metrics confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved by 17% . Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process. Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Independent validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 suggested Phase 0 rewrite pairs confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, and long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved modestly. Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process.

6
Attitudes and Perceptions of Generative Artificial Intelligence Chatbots in the Scientific Process of Traditional, Complementary, and Integrative Medicine Research: A Large-Scale, International Cross-Sectional Survey

Ng, J. Y.; Tan, J.; Syed, N.; Adapa, K.; Gupta, P. K.; Li, S.; Mehta, D.; Ring, M.; Shridhar, M.; Souza, J. P.; Yoshino, T.; Lee, M. S.; Cramer, H.

2026-04-15 health informatics 10.64898/2026.04.13.26350612 medRxiv
Top 3%
0.3%
Show abstract

Background: Generative artificial intelligence (GenAI) chatbots have shown utility in assisting with various research tasks. Traditional, complementary, and integrative medicine (TCIM) is a patient-centric approach that emphasizes holistic well-being. The integration of TCIM and GenAI presents numerous key opportunities. However, TCIM researchers' attitudes toward GenAI tools remain less understood. This large-scale, international cross-sectional survey aimed to elucidate the attitudes and perceptions of TCIM researchers regarding the use of GenAI chatbots in the scientific process. Methods: A search strategy in Ovid MEDLINE identified corresponding authors who were TCIM researchers. Eligible authors were invited to complete an anonymous online survey administered via SurveyMonkey. The survey included questions on socio-demographic characteristics, familiarity with GenAI chatbots, and perceived benefits and challenges of using GenAI chatbots. Results were analysed using descriptive statistics and thematic content analysis. Results: The survey received 716 responses. Most respondents reported familiarity with GenAI chatbots (58.08%) and viewed them as very important to the future of scientific research (54.37%). The most acknowledged benefits included workload reduction (74.07%) and increased efficiency in data analysis/experimentation (71.14%). The most frequently reported challenges involved bias, errors, and limitations. More than half of the respondents (57.02%) expressed a need for training to use GenAI chatbots in the scientific process, alongside an interest in receiving training (72.07%). However, 43.67% indicated that their institutions did not offer these programs. Discussion: By developing a deeper understanding of TCIM researchers' perspectives, future AI applications in this field can be more informed, and guide future policies and collaboration among researchers.

7
Walking to the beat: the impact of non-invasive brain stimulation and music on gait in Parkinsons Disease

Emerick, M.; Grahn, J. A.

2026-04-13 rehabilitation medicine and physical therapy 10.64898/2026.04.08.26350408 medRxiv
Top 3%
0.3%
Show abstract

Walking impairments in Parkinsons disease (PD), including reduced speed, cadence, and stride length, and increased variability, impair mobility and raise fall risk. Conventional treatments may fail to address these deficits, underscoring the need for complementary non-invasive alternatives. This study examined whether combining rhythmic auditory cueing with transcranial direct current stimulation (tDCS) over the supplementary motor area (SMA), a critical region for internally-generated movement, would enhance gait performance in PD. Thirty-three participants with PD and thirty-two healthy controls completed two sessions (anodal vs. sham tDCS) with gait assessed during stimulation, immediately after stimulation, and 15 minutes after stimulation under two auditory conditions: walking in silence and walking to music paced 10% faster than baseline cadence. Spatiotemporal, variability, and stability gait parameters were analyzed using linear mixed-effects models. Rhythmic auditory cueing significantly increased cadence and speed during, immediately after, and especially 15 minutes after stimulation, suggesting sustained effects of rhythmic entrainment. Anodal tDCS produced faster cadence, as well as lower stride time variability and stride width, particularly in individuals with PD. Although both music and anodal tDCS affected gait, no interaction was observed, indicating independent effects. Individuals with PD had greater gait variability overall, and adjusted temporal gait parameters less to music than healthy controls did. Anodal stimulation reduced walking variability in PD, reducing the group differences observed under sham conditions. These findings suggest that rhythmic cueing and SMA stimulation target complementary mechanisms, highlighting the promise of combined tDCS-music interventions for gait rehabilitation in PD.

8
LLM-Driven Target Trial Emulation with Human-in-the-Loop Validation for Randomized Trial: Automated Protocol Extraction and Real-World Outcome Evaluation{Psi}

Dey, S. K.; Qureshi, A. I.; Shyu, C.-R.

2026-04-13 health informatics 10.64898/2026.04.09.26350523 medRxiv
Top 3%
0.3%
Show abstract

Target trial emulation (TTE) enables causal inference from observational data but remains bottlenecked by manual, expert-dependent protocol operationalization. While large language models (LLMs) have advanced clinical knowledge extraction and code generation, their ability to automate end-to-end TTE workflows remains largely unexplored. We present an LLM-driven framework using retrieval-augmented generation to extract the five core TTE design parameters from the Carotid Revascularization and Medical Management for Asymptomatic Carotid Stenosis Trial (CREST-2) protocol and generate executable phenotyping pipelines for real-world EHR data. The performance of the framework was evaluated along two dimensions. First, protocol extraction accuracy was assessed against a gold-standard checklist of trial design components using precision, recall, and F1-score metrics. Second, outcome validity was evaluated through population-level concordance analyses comparing EHR-derived outcomes with published trial endpoints using standardized mean difference, observed-to-expected ratios, confidence interval overlap, and two-proportion z-tests. Further, Human-in-the-loop validation assessed the correctness of extracted clinical logic and phenotype definitions. Together, these evaluations demonstrate a structured approach for assessing LLM-driven protocol-to-pipeline translation for scalable real-world evidence generation.

9
A multimodal AI model for modeling the genetic risk factor of Alzeihmer's disease

Nguyen, T. M.; Woods, C.; Liu, J.; Wang, C.; Lin, A.-L.; Cheng, J.

2026-04-15 health informatics 10.64898/2026.04.13.26350803 medRxiv
Top 3%
0.3%
Show abstract

The apolipoprotein E {varepsilon}4 (APOE4) allele is the strongest genetic risk factor for late-onset Alzheimer's disease (AD), the most common form of dementia. APOE4 carriers exhibit cerebrovascular and metabolic dysfunction, structural brain alterations, and gut microbiome changes decades before the onset of clinical symptoms. A better understanding of the early manifestation of these physiological changes is critical for the development of timely AD interventions and risk reduction protocols. Multimodal datasets encompassing a wide range of APOE4- and AD-associated biomarkers provide a valuable opportunity to gain insight into the APOE4 phenotype; however, these datasets often present analytical challenges due to small sample sizes and high heterogeneity. Here, we propose a two-stage multimodal AI model (APOEFormer) that integrates blood metabolites, brain vascular and structural MRI, microbiome profiles, and other clinical and demographic data to predict APOE4 allele status. In the first stage, modality-specific encoders are used to generate initial representations of input data modalities, which are aligned in a shared latent space via self-supervised contrastive learning during pretraining. This objective encourages the learning of informative and consistent representations across modalities by leveraging cross-modality relationships. In the second stage, the pretrained representations are used as inputs to a multimodal transformer that integrates information across modalities to predict a key AD risk genetic variant (APOE4). Across 10 independent experimental runs with different train-validation-test splits, APOEFormer predicts whether an individual carries an APOE4 allele with an average accuracy of 75%, demonstrating robust performance under limited sample sizes. Post hoc perturbation analysis of the predictive model revealed valuable insights into the driving components of the APOE4 phenotype, including key blood biomarkers and brain regions strongly associated with APOE4.

10
Apnea-hypopnea index estimation with wrist-worn photoplethysmography

Fonseca, P.; Ross, M.; van Meulen, F.; Asin, J.; van Gilst, M. M.; Overeem, S.

2026-04-11 health informatics 10.64898/2026.04.08.26350411 medRxiv
Top 4%
0.2%
Show abstract

ObjectiveLong term monitoring of obstructive sleep apnea (OSA) severity may be relevant for several clinical applications. We developed a method for estimating the apnea-hypopnea index (AHI) using wrist-worn, reflective photoplethysmography (PPG). ApproachA neural network was developed to detect respiratory events using PPG and PPG-derived sleep stages as input. The development database encompassed retrospective data from three polysomnographic datasets (N=3111), including a dataset with concurrent reflective PPG recordings from a wrist-worn device (N=969). The model was pre-trained with (transmissive) finger-PPG signals from all overnight recordings and then fine-tuned to wrist-PPG characteristics using transfer learning. Validation was performed on the test portion of the development set and on a fourth, external hold-out dataset containing both wrist-PPG and PSG data (N=171). Performance was evaluated in terms of AHI estimation accuracy and OSA severity classification. Main ResultsThe fine-tuned wrist-PPG model demonstrated strong agreement with the PSG-derived gold-standard AHI, achieving intra-class correlation coefficients of 0.87 in the test portion of the development set and 0.91 in the external hold-out validation set. Diagnostic performance was high, with accuracies above 80% for all severity thresholds. SignificanceThe study highlights the potential of reflective PPG-based AHI estimation, achieving high estimation performance in comparison with PSG. These measurements can be performed with relatively comfortable sensors integrated in convenient wrist-worn wearables, enabling long-term assessment of sleep disordered breathing, both in a diagnostic phase, and during therapy follow-up.

11
SPLIT: Safety Prioritization for Long COVID Drug Repurposing via a Causal Integrated Targeting Framework

Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.

2026-04-16 health informatics 10.64898/2026.04.12.26350701 medRxiv
Top 4%
0.2%
Show abstract

Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.

12
A Modified Percutaneous Spinal Cord Stimulation Implant Approach to Target the Ventral Spinal Cord

Valestrino, K. J.; Ihediwa, C. V.; Dorius, G. T.; Conger, A. M.; Glinka-Przybysz, A.; McCormick, Z. L.; Fogarty, A. E.; Mahan, M. A.; Hernandez-Bello, J.; Konrad, P. E.; Burnham, T. R.; Dalrymple, A. N.

2026-04-13 surgery 10.64898/2026.04.06.26350176 medRxiv
Top 4%
0.2%
Show abstract

ObjectivesEpidural spinal cord stimulation (SCS) is an emerging therapy for motor rehabilitation following spinal cord injury (SCI) and other motor disorders. Conventionally, SCS leads are placed along the dorsal spinal cord (SCSD), where stimulation activates large diameter afferent fibers, which indirectly activate motoneurons through reflex pathways. This leads to broad activation of flexor and extensor muscles and limited fine-tuned control of motor output. Targeting the ventral spinal cord (SCSV) may enable more direct activation of motoneuron pools, potentially improving the specificity of muscle activation; however, there is currently no established method to place leads ventrally. To address this, we evaluated the feasibility of four modified percutaneous implantation techniques to target the ventrolateral thoracolumbar spinal cord. Materials and methodsPercutaneous SCSV implantation was performed in three human cadaver torso specimens under fluoroscopic guidance. The following approaches were evaluated: sacral hiatus, transforaminal, interlaminar contralateral, and interlaminar ipsilateral. The leads in the latter 3 approaches were inserted between L1 and L5. Eighteen implants were attempted, with nine leads retained for analysis. Lead and electrode position were assessed using computed tomography (CT) with three-dimensional reconstruction, along with anatomical dissection to verify lead and electrode placement within the epidural space. ResultsSuccessful ventral epidural lead placement was achieved using all four implantation approaches. The sacral hiatus (16/16 electrodes) and transforaminal (8/8 electrodes) approaches resulted in exclusively ventrolateral placement. The interlaminar contralateral approach led to 27/32 electrodes positioned ventrolaterally and 5/32 dorsally. The interlaminar ipsilateral implantation approach led to 14/32 electrodes positioned ventrolaterally and 18/32 positioned ventromedially. ConclusionsThese findings demonstrate that ventral epidural SCS lead placement can be achieved using modified percutaneous implant techniques. The four approaches outlined here provide a clinically feasible pathway to SCSV and establishes a foundation for future clinical studies investigating SCSV for motor rehabilitation following SCI.

13
Medicalbench: Evaluating Large Language Models Towards Improved Medical Concept Extraction

Yang, Z.; Lyng, G. D.; Batra, S. S.; Tillman, R. E.

2026-04-16 health informatics 10.64898/2026.04.12.26350704 medRxiv
Top 5%
0.1%
Show abstract

Medical concept extraction from electronic health records underpins many downstream applications, yet remains challenging because medically meaningful concepts, such as diagnoses, are frequently implied rather than explicitly stated in medical narratives. Existing benchmarks with human-annotated evidence spans underscore the importance of grounding extracted concepts in medical text. However, they predominantly focus on explicitly stated concepts and provide limited coverage of cases in which medically relevant concepts must be inferred. We present MedicalBench, a new benchmark for medical concept extraction with evidence grounding that evaluates implicit medical reasoning. MedicalBench formulates medical concept extraction as a verification task over medical note concept pairs, coupled with sentence level evidence identification. Built from MIMIC-IV discharge summaries and human verified ICD-10 codes, the dataset is curated through a multi stage large language model (LLM) triage pipeline followed by medical annotation and expert review. It deliberately includes implicit positives, semantically confusable negatives, and cases where LLM judgments disagree with medical expert assessments. Annotators provide sentence level evidence spans and concise medical rationales. The final dataset contains 823 high quality examples. We define two complementary evaluation tasks: (1) medical concept extraction and (2) sentence level evidence retrieval, enabling assessment of both correctness and interpretability. Benchmarking state-of-the-art LLMs and a supervised baseline reveals that performance remains modest, highlighting the difficulty of extracting implicitly expressed concepts. We further show that explicitly incorporating reasoning cues and prompting to extract implicit evidence substantially improves medical concept extractions, while performance is largely invariant to note length, indicating that MedicalBench isolates reasoning difficulty rather than superficial confounders. MedicalBench provides the first systematic benchmark for implicit, evidence-grounded medical concept extraction, offering a foundation for developing medical language models that can both identify medically relevant concepts and justify their predictions in a transparent and medically faithful manner.

14
Trade-offs in emergency transport protocols for access to hip fracture management: a geospatial analysis of selective versus standard transfer in Ontario long-term care

Yee, N. J.; Chen, T.; Huang, Y. Q.; Whyne, C.; Halai, M.

2026-04-14 orthopedics 10.64898/2026.04.12.26350713 medRxiv
Top 5%
0.1%
Show abstract

Objectives: For suspected hip fractures, prehospital protocols directing patients to an orthopaedic centre rather than the nearest emergency department (ED) could reduce time-to-surgery but may impact EMS travel burden. This study evaluates the impact of transfer protocols by quantifying transport to hospitals from long term care (LTC) facilities across Ontario. Methods: A retrospective cross-sectional analysis of all Ontario LTC facilities and hospitals was performed. Two protocols were modeled: standard transfer to the nearest ED with subsequent transfer if required, and selective transfer based on Collingwood Hip Fracture Rule prehospital screening1 directly to the nearest orthopaedic services (orthoED). Median one-way travel distances were calculated from Google Maps. Results: In Ontario, 15.4% of LTC residents require hospital destination decisions because their nearest ED lacks orthopaedic services; for these facilities, median distances were 2.7km to the ED and 36.0km to the orthoED. Among the 52 LTC facilities where selective transfer was distance-optimal, it substantially reduced travel for patients with hip fracture (31.1km vs 49.6km; P<.01) while only modestly increasing travel for patients without hip fracture. Where standard transfer was distance-optimal, little travel difference was noted for patients with hip fracture, however false positive screened patients traveled significantly further to an orthoED. Greatest negative consequences of selective transfer lie in the 1.3% of residents living farthest (>100km) from an orthoED. Conclusions: EMS direct transportation to hospitals with orthopaedics may improve hip fracture care but can increase EMS burden due to patients identified falsely as having a hip fracture, particularly in remote communities.

15
GPR143, a novel immunohistochemical marker for renal tumors with FLCN/TSC/MTOR-TFE alterations

Li, Q.; Singh, A.; Hu, R.; Huang, W.; Shapiro, D. D.; Abel, E. J.; Zong, Y.

2026-04-13 pathology 10.64898/2026.04.06.26350070 medRxiv
Top 5%
0.1%
Show abstract

Although several ancillary tests are available in limited laboratories, diagnosis of microphthalmia (MiT)/TFE family translocation renal cell carcinoma (tRCC) could be challenging due to diverse and overlapping tumor morphology and the lack of reliable biomarkers. GPNMB has been recently identified as a diagnostic marker for various renal neoplasms with FLCN/TSC/mTOR-TFE alterations. However, the sensitivity and specificity of GPNMB immunostain are suboptimal and the result interpretation in ambiguous cases could be difficult. To search additional biomarkers that could improve the screening sensitivity and predict genetic aberrations in FLCN/TSC/mTOR-TFE pathway in renal tumors, we performed bioinformatic analysis of publicly available cancer databases and found GPR143, a transmembrane protein regulated by MiT transcription factors, was highly expressed in a subset of renal cell carcinomas (RCCs). In two the Cancer Genome Atlas (TCGA) kidney cancer cohorts, RCCs with high levels of GPR143 expression were enriched for renal neoplasms with FLCN/TSC/mTOR-TFE alterations. Similar to GPNMB labeling, GPR143 immunostain was positive in the majority of tRCC cases and renal tumors with FLCN/TSC/mTOR alterations, suggesting that GPR143 could function as another surrogate marker for FLCN/TSC/mTOR-TFE alterations in certain renal tumors. Interestingly, despite the concordant GPR143 and GPNMB immunoreactivity in most renal neoplasms with FLCN/TSC/mTOR-TFE alterations, diffuse GPR143 immunostain was observed in some cases with negative or focal GPNMB labeling. Taken together, our results indicate GPR143 could serve as a useful adjunct marker to improve the sensitivity for screening renal tumors with FLCN/TSC/mTOR-TFE alterations.

16
Analytical Choices Impact the Estimation of Rhythmic and Arrhythmic Components of Brain Activity

da Silva Castanheira, J.; Landry, M.; Fleming, S. M.

2026-04-11 neuroscience 10.1101/2025.09.24.678322 medRxiv
Top 5%
0.1%
Show abstract

Brain activity comprises both rhythmic (periodic) and arrhythmic (aperiodic) components. These signal elements vary across healthy aging, and disease, and may make distinct contributions to conscious perception. Despite pioneering techniques to parameterize rhythmic and arrhythmic neural components based on power spectra, the methodology for quantifying rhythmic activity remains in its infancy. Previous work has relied on parametric estimates of rhythmic power extracted from specparam, or estimates of rhythmic power obtained after detrending neural spectra. Variation in analytical choices for isolating brain rhythms from background arrhythmic activity makes interpreting findings across studies difficult. Whether these current approaches can accurately recover the independent contribution of these neural signal elements remains to be established. Here, using simulation and parameter recovery approaches, we show that power estimates obtained from detrended spectra conflate these two neurophysiological components, yielding spurious correlations between spectral model parameters. In contrast, modelled rhythmic power obtained from specparam, which detrends the power spectra and parametrizes brain rhythms, independently recovers the rhythmic and arrhythmic components in simulated neural time series, minimising spurious relationships. We validate these methods using resting-state recordings from a large cohort. Based on our findings, we recommend modelled rhythmic power estimates from specparam for the robust independent quantification of rhythmic and arrhythmic signal components for cognitive neuroscience.

17
Spine Reviews: Crowdsourcing Global Spine Expert Knowledge via Digital Ledger Technology

Challier, V.; Diebo, B.; Lafage, V.; Dehouche, N.; Lonjon, G.; Cristini, J.; SpineDAO,

2026-04-13 health informatics 10.64898/2026.04.11.26350678 medRxiv
Top 5%
0.1%
Show abstract

Study Design: Prospective observational study using a novel digital ledger technology (DLT)-based crowdsourcing platform. Objective: To develop and evaluate Spine Reviews, a blockchain-based platform for aggregating spine treatment recommendations from an international specialist panel, and to validate the clinical coherence of the resulting dataset. Summary of Background Data: Predictive models for low back pain treatment are limited by small, homogeneous datasets that fail to capture inter-clinician variability. Traditional multi-center data collection is expensive, slow, and geographically constrained. DLT-based crowdsourcing with cryptographic credentialing may overcome these barriers. Methods: Five hundred synthetic patient vignettes (digital twins) were generated; 463 retained after quality control. A review platform was built on the Solana blockchain using non-transferable Soulbound Tokens (SBTs) for credentialing and smart-contract compensation. Fifty-two specialists from 7 countries provided 4+ reviews per vignette across four treatment tiers, without access to imaging or physical examination. Mixed-effects regression with reviewer random intercepts partitioned decision variability. Results: The platform collected 2,066 completed reviews (97.7%) over 37 days at USD 0.97/review. Variance decomposition revealed that 36.7% of treatment tier variability was attributable to patient presentation, 19.2% to reviewer practice style, and 44.1% to their interaction. Neurological deficits (beta=0.39), symptom duration (beta=0.12), and pain (beta=0.09) independently predicted treatment escalation (all p<0.001). Gwet's AC1 was almost perfect for emergency (0.92) and substantial for conservative decisions (0.67). Reviewer confidence in treatment recommendations decreased with escalating tier severity (conservative 4.59/5 vs surgical 4.05/5), suggesting appropriate uncertainty calibration. Conclusions: DLT with SBT credentialing enables rapid, global, cost-effective aggregation of clinically coherent expert judgment. The three-component variance structure quantifies clinical equipoise in spine care and establishes that predictive models require diverse, multi-reviewer training data. Keywords: digital ledger technology; blockchain; crowdsourcing; clinical decision-making; low back pain; Soulbound Tokens

18
Mechanistic Insights into Skin Sympathetic Nerve Activity Dynamics in Healthy Subjects Through a Two-Layer Signal-Analytical and Closed-Loop Physiological Modeling Framework

Lin, R.; Halfwerk, F. R.; Donker, D. W.; Tertoolen, J.; van der Pas, V. R.; Laverman, G. D.; Wang, Y.

2026-04-13 health informatics 10.64898/2026.04.11.26350680 medRxiv
Top 6%
0.1%
Show abstract

Objective: Skin sympathetic nerve activity (SKNA) has emerged as a promising non-invasive surrogate measure of sympathetic drive, but its relevant physiological characteristics remain ill-defined. This observational study aims to investigate its regulatory patterns during rest and Valsalva maneuver (VM) in healthy participants. Method: Using a two-layer strategy integrating signal analysis and physiological modelling, we analyzed data recorded from 41 subjects performing repeated VMs. The observational layer includes time-domain feature comparisons using linear mixed-effect models, and time-varying spectral coherence analysis. The mechanistic layer proposes a mathematical model to investigate whether baroreflex and respiratory modulation are sufficient to reproduce the observed HR and average SKNA (aSKNA) dynamics. Main Results: Mean integrated SKNA (iSKNA) showed more significant change than HRV for VM induced effects. We also found mean iSKNA increase during VM varies with BMI and sex. The coherence analysis indicated that iSKNA strongly synchronized with EDR under resting conditions. The proposed model successfully reproduced main characteristics of aSKNA dynamics, yielding a high median Pearson correlation coefficient of 0.80 ([Q1, Q3] = [0.60, 0.91]). In contrast, HR dynamics were only partially captured, with a median PCC of 0.37 ([Q1, Q3] = [0.16, 0.55]). These results likely suggest SKNA provides a more direct representation of sympathetic burst dynamics during VM in healthy subjects. Significance: This study provides convergent evidence that SKNA reflects known autonomic regulatory influences in healthy subjects. These findings strengthen the physiological interpretability of SKNA while clarifying its appropriate use as a practical biomarker of sympathetic function.

19
Cochrane Evaluation of (Semi-) Automated Review (CESAR) Methods: Protocol for an adaptive platform study within reviews

Gartlehner, G.; Banda, S.; Callaghan, M.; Chase, J.-A.; Dobrescu, A.; Eisele-Metzger, A.; Flemyng, E.; Gardner, S.; Griebler, U.; Helfer, B.; Jemiolo, P.; Macura, B.; Minx, J. C.; Noel-Storr, A.; Rajabzadeh Tahmasebi, N.; Sharifan, A.; Meerpohl, J.; Thomas, J.

2026-04-15 health informatics 10.64898/2026.04.13.26350802 medRxiv
Top 6%
0.1%
Show abstract

Background: Artificial intelligence (AI) has the potential to improve the efficiency of evidence synthesis and reduce human error. However, robust methods for evaluating rapidly evolving AI tools within the practical workflows of evidence synthesis remain underdeveloped. This protocol describes a study design for assessing the effectiveness, efficiency, and usability of AI tools in comparison to traditional human-only workflows in the context of Cochrane systematic reviews. Methods: Members of the Cochrane Evaluation of (Semi-) Automated Review (CESAR) Methods Project developed an adaptive platform study-within-a-review (SWAR) design, modeled after clinical platform trials. This design employs a master protocol to concurrently evaluate multiple AI tools (interventions) against a standard human-only process (control) across three key review tasks: title and abstract screening, full-text screening, and data extraction. The adaptive framework allows for the addition or removal of AI tools based on interim performance analyses without necessitating a restart of the study. Performance will be assessed using metrics such as accuracy (sensitivity, specificity, precision), efficiency (time on task), response stability, impact of errors, and usability, in alignment with Responsible use of AI in evidence SynthEsis (RAISE) principles. Results: The study will generate comparative data about the performance and usability of specific AI tools employed in a semi- or fully automated manner relative to standard human effort. The protocol provides a flexible framework for the assessment of AI tools in evidence synthesis, addressing the limitations of static, one-time evaluations. Discussion: This study protocol presents a novel methodological approach to addressing the challenges of evaluating AI tools for evidence syntheses. By validating entire workflows rather than individual technologies, the findings will establish an evidence base for determining the viability of integrating AI into evidence-synthesis workflows. The adaptive design of this study is flexible and can be adopted by other investigators, ensuring that the evaluation framework remains relevant as new tools emerge.

20
JARVIS, should this study be selected for full-text screening? Performance of a Joint AI-ReViewer Interactive Screening tool for systematic reviews

Barreto, G. H. C.; Burke, C.; Davies, P.; Halicka, M.; Paterson, C.; Swinton, P.; Saunders, B.; Higgins, J. P. T.

2026-04-11 health informatics 10.64898/2026.04.08.26350384 medRxiv
Top 7%
0.0%
Show abstract

BackgroundSystematic reviews are essential for evidence-based decision making in health sciences but require substantial time and resource for manual processes, particularly title and abstract screening. Recent advances in machine learning and large language models (LLMs) have demonstrated promise in accelerating screening with high recall but are often limited by modest gains in efficiency, mostly due to the absence of a generalisable stopping criterion. Here, we introduce and report preliminary findings on the performance of a novel semi-automated active learning system, JARVIS, that integrates LLM-based reasoning using the PICOS framework, neural networks-based classification, and human decision-making to facilitate abstract screening. MethodsDatasets containing author-made inclusion and exclusion decisions from six published systematic reviews were used to pilot the semi-automated screening system. Model performance was evaluated across recall, specificity and area under the curve precision-recall (AUC-PR), using full-text inclusion as the ground truth. Estimated workload and financial savings were calculated by comparing total screening time and reviewer costs across manual and semi-automated scenarios. ResultsAcross the six review datasets, recall ranged between 98.2% and 100%, and specificity ranged between 97.9% and 99.2% at the defined stopping point. Across iterations, AUC-PR values ranged between 83.8% and 100%. Compared with human-only screening, JARVIS delivered workload savings between 71.0% and 93.6%. When a single reviewer read the excluded records, workload savings ranged between 35.6 % and 46.8%. ConclusionThe proposed semi-automated system substantially reduced reviewer workload while maintaining high recall, improving on previously reported approaches. Further validation in larger and more varied reviews, as well as prospective testing, is warranted.